At the metabolite level, we can look at the pairwise comparisons for all experimental groups. In the following section we will look at the pairwise comparisons using the metabolite_pairwise function defined in the R scripts. We will first read in the analysis data needed for this analysis. Since the column names of the analysis data are numeric, we should convert all of the column names to to characters.
# 1. Read in data from the analysis above
analysis_data <- read.csv("../Data/Processed/analysis_data.csv", check.names = F)
# 2. Make the variable names charactors
names(analysis_data) <- as.character(names(analysis_data))
Once we have the analysis data loaded in, we can now look at the pairwise comparisons for each of our experimental groups. To do this we will be using the metabolite_pairwise function defined in the R scripts. This function will analyze each metabolite individually. For each metabolite, the the metabolite_pairwise function will first test if any of the experimental groups explained a significant proportion of the variance in the metabolite using an F-test. Since we will be looking at multiple comparisons for the metabolite it is good practice to first look at the over-all p-value from the F-test before looking at the pairwise comparisons.
The metabolite_pairwise function then looks at all pairwise comparisons utilizing the emmeans package. The metabolite_pairwise function returns a data frame with the metabolite, overall p-value, estimated difference in means for all experimental groups, and the p-value which test if the difference the groups is significantly different.
The metabolite_pairwise function has three arguments:
form: This is a character string the resembles the right hand side of a simple linear regression model in R. For example form = “Group1 + Group2”.
data: The analysis data we will use for the pairwise comparisons.
Metabolites: A list of metabolites that we want to be analyzed.
In the next chunk we will stratify the analysis by running the metabolite_pairwise function for each stata in our analysis data set. To do this we:
Define the variable in the analysis data that we want to stratify our analysis by.
Define the form variable for the metabolite_pairwise function.
Create a list of metabolites that we want to pairwise analysis for. By default, we are going to use all metabolites in the analysis data.
Create a for look that:
Filters the analysis_data to focus on a single strata.
Run the metabolite_pairwise function for that strata.
Save results in the “../Data/Results/Pairwise_Comparisons” folder with the file name “Pairwise_{{strata}}.
# 1. Define the variable in the analysis data that we want to stratify
strata_var <- "Gender"
# 2. Define the form variable
form <- "GROUP_NAME*TIME1"
# Create a list of metabolites
analysis_variables <- c("PARENT_SAMPLE_NAME", "GROUP_NAME",
"TIME1", "Gender")
metabolites <- names(analysis_data)[!names(analysis_data) %in% analysis_variables ]
stratas <- unique(analysis_data[,strata_var])
for (i in 1:length(stratas)) {
# 4. Filter analysis data
dat <- analysis_data[analysis_data[,strata_var]==stratas[i],]
# 5. Run the metabolite_pairwise function for the strata
strat_results <- metabolite_pairwise(form = form,
data=dat,
metabolites = metabolites)
# 6. Save results
write.csv(strat_results, file = paste0("../Data/Results/Pairwise_Comparisons/Pairwise_",
stratas[i],".csv"), row.names = F)
}
## ================================================================================
## ================================================================================
For the metabolites which had a significant overall p-value (which tested if the treatment group means were equal under the null hypothesis), we will produce a heatmap of the log fold changes. This heatmap colors will only show if the log fold-change is greater than log(2) or less than log(.5). Therefore, this heatmap will only focus on comparisons with a fold change of two or greater. To do this we will,
Read in the results from the pairwise comparisons. If you stratified the analysis, you may want to copy this chunk and rerun for each strata. Additionally, we will need to read in the chemical annotation file. When we read in the results from the pairwise comparisons, we will filter the results to only look at the metabolites with a significant overall p-value.
Merge the chemical annotation fill with the results from the pairwise comparisons.
Create the heatmap. In this step we are doing some pre-precessing which removes the p-value columns and then transforms the remaining fold change results from wide to long format. We then mutate the logFoldChanges to only display when the fold change is greater than 2 or less than .5. Then we use plotly to create an interactive heatmap.
Display heatmap
Save heatmap in “…/Outputs/Plots” with the file name “PaiwiseComp_EST_heatmap”. Since this plot is interactive, it will be save as an html file that can be opened in a web browser.
# 1. Read in pairwise results
results_pair <-read.csv("../Data/Results/Pairwise_Comparisons/Pairwise_Female.csv"
, check.names = F) %>%
filter(Overall_pval < 0.05)
# Read chemical annotation file
met_excel <- "../Data/Metabolon/UNAZ-0501-22VW_ DATA TABLES.xlsx"
chem_data <- read.xlsx(met_excel,sheet = "Chemical Annotation")
# 2. Merge the chemical annotation fill with the results from the pairwise comparisons.
data <- chem_data %>%
select(SUB_PATHWAY,CHEMICAL_NAME,CHEM_ID) %>%
merge(results_pair, by.x = "CHEM_ID",by.y = "Metabolite") %>%
arrange(SUB_PATHWAY)
# 3. Produce Heatmap
p = data %>%
select(CHEM_ID,SUB_PATHWAY,CHEMICAL_NAME, all_of(names(data)[grepl("EST",names(data))])) %>%
melt(id.vars = c("CHEM_ID","SUB_PATHWAY","CHEMICAL_NAME"), variable.name = "Contrast",
value.name = "logFoldChange") %>%
mutate(Contrast = gsub("_ESTS","",Contrast),
logFoldChange = ifelse(logFoldChange < log(0.5) | logFoldChange > log(2),
round(logFoldChange,3), NA)) %>%
plot_ly(
type = "heatmap",
x= ~Contrast,
y = ~CHEMICAL_NAME,
z = ~logFoldChange,
text = ~SUB_PATHWAY,
hovertemplate = paste("<b>Metabolite: %{y}</b><br><br>",
"Sub-pathway: %{text}<br>",
"Contrast: %{x}<br>",
"Log Fold-Change: %{z}<br>",
"<extra></extra>"),
colorbar = list(title ="<b>Log Fold-Change</b>")) %>%
layout(title = "<b>Log Fold-Change Heatmap</b>",
xaxis = list(title="<b>Contrasts</b>"),
yaxis = list(title = ""))
# 4. Display heatmap
p
# 5. Save.
#htmlwidgets::saveWidget(p,paste0("../Outputs/Plots/PaiwiseComp_EST_heatmap_",".html"))
For the metabolites which had a significant overall p-value (which tested if the treatment group means were equal under the null hypothesis), we will now produce a heatmap based on the pairwise comparison p-values. To do this we will:
Read in the results from the pairwise comparisons. If you stratified the analysis, you may want to copy this chunk and rerun for each strata. Additionally, we will need to read in the chemical annotation file. When we read in the results from the pairwise comparisons, we will filter the results to only look at the metabolites with a significant overall p-value.
Merge the chemical annotation fill with the results from the pairwise comparisons.
Create the heatmap. In this step we are doing some pre-precessing which removes the log fold change columns and then transforms the remaining p-value results from wide to long format. Then we use plotly to create an interactive heatmap.
Display heatmap
Save heatmap in “…/Outputs/Plots” with the file name “PaiwiseComp_p_value_heatmap”. Since this plot is interactive, it will be save as an html file that can be opened in a web browser.
# 1. Read in pairwise results
results_pair <-read.csv("../Data/Results/Pairwise_Comparisons/Pairwise_Female.csv"
, check.names = F) %>%
filter(Overall_pval < 0.05)
# Read chemical annotation file
met_excel <- "../Data/Metabolon/UNAZ-0501-22VW_ DATA TABLES.xlsx"
chem_data <- read.xlsx(met_excel,sheet = "Chemical Annotation")
# 2. Merge the chemical annotation fill with the results from the pairwise comparisons.
data <- chem_data %>%
select(SUB_PATHWAY,CHEMICAL_NAME,CHEM_ID) %>%
merge(results_pair, by.x = "CHEM_ID",by.y = "Metabolite") %>%
arrange(SUB_PATHWAY)
# 3. Produce Heatmap
p = data %>%
select(CHEM_ID,SUB_PATHWAY,CHEMICAL_NAME, all_of(names(data)[grepl("PVALS",names(data))])) %>%
melt(id.vars = c("CHEM_ID","SUB_PATHWAY","CHEMICAL_NAME"), variable.name = "Contrast",
value.name = "P_value") %>%
mutate(Contrast = gsub("_PVALS","",Contrast),
P_value = round(P_value,3)) %>%
arrange(SUB_PATHWAY) %>%
plot_ly(
type = "heatmap",
x= ~Contrast,
y = ~CHEMICAL_NAME,
z = ~P_value,
text = ~SUB_PATHWAY,
hovertemplate = paste("<b>Metabolite: %{y}</b><br><br>",
"Sub-pathway: %{text}<br>",
"Contrast: %{x}<br>",
"P-Value: %{z}<br>",
"<extra></extra>"),
colorbar = list(title ="<b>P-value</b>")) %>%
layout(title = "<b>P-Value Heatmap</b>",
xaxis = list(title="<b>Contrasts</b>"),
yaxis = list(title = ""))
# 4. Display heatmap
p
# 5. Save.
#htmlwidgets::saveWidget(p,paste0("../Outputs/Plots/PaiwiseComp_p_value_heatmap_",".html"))